Goto

Collaborating Authors

 text-to-image generator


Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective

Wang, Bing, Li, Ximing, Wang, Yanjun, Li, Changchun, Wu, Lin Yuanbo, Wang, Buyu, Wang, Shengsheng

arXiv.org Artificial Intelligence

Multimodal Misinformation Detection (MMD) refers to the task of detecting social media posts involving misinformation, where the post often contains text and image modalities. However, by observing the MMD posts, we hold that the text modality may be much more informative than the image modality because the text generally describes the whole event/story of the current post but the image often presents partial scenes only. Our preliminary empirical results indicate that the image modality exactly contributes less to MMD. Upon this idea, we propose a new MMD method named RETSIMD. Specifically, we suppose that each text can be divided into several segments, and each text segment describes a partial scene that can be presented by an image. Accordingly, we split the text into a sequence of segments, and feed these segments into a pre-trained text-to-image generator to augment a sequence of images. We further incorporate two auxiliary objectives concerning text-image and image-label mutual information, and further post-train the generator over an auxiliary text-to-image generation benchmark dataset. Additionally, we propose a graph structure by defining three heuristic relationships between images, and use a graph neural network to generate the fused features. Extensive empirical results validate the effectiveness of RETSIMD.


Text-to-Image Generation for Vocabulary Learning Using the Keyword Method

Attygalle, Nuwan T., Kljun, Matjaž, Quigley, Aaron, Pucihar, Klen čOpič, Grubert, Jens, Biener, Verena, Leiva, Luis A., Yoneyama, Juri, Toniolo, Alice, Miguel, Angela, Kato, Hirokazu, Weerasinghe, Maheshya

arXiv.org Artificial Intelligence

The 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention.


Why A.I. Isn't Going to Make Art

The New Yorker

In 1953, Roald Dahl published "The Great Automatic Grammatizator," a short story about an electrical engineer who secretly desires to be a writer. One day, after completing construction of the world's fastest calculating machine, the engineer realizes that "English grammar is governed by rules that are almost mathematical in their strictness." He constructs a fiction-writing machine that can produce a five-thousand-word short story in thirty seconds; a novel takes fifteen minutes and requires the operator to manipulate handles and foot pedals, as if he were driving a car or playing an organ, to regulate the levels of humor and pathos. The resulting novels are so popular that, within a year, half the fiction published in English is a product of the engineer's invention. Is there anything about art that makes us think it can't be created by pushing a button, as in Dahl's imagination?


Prompting the E-Brushes: Users as Authors in Generative AI

Mei, Yiyang

arXiv.org Artificial Intelligence

Since its introduction in 2022, Generative AI has significantly impacted the art world, from winning state art fairs to creating complex videos from simple prompts. Amid this renaissance, a pivotal issue emerges: should users of Generative AI be recognized as authors eligible for copyright protection? The Copyright Office, in its March 2023 Guidance, argues against this notion. By comparing the prompts to clients' instructions for commissioned art, the Office denies users authorship due to their limited role in the creative process. This Article challenges this viewpoint and advocates for the recognition of Generative AI users who incorporate these tools into their creative endeavors. It argues that the current policy fails to consider the intricate and dynamic interaction between Generative AI users and the models, where users actively influence the output through a process of adjustment, refinement, selection, and arrangement. Rather than dismissing the contributions generated by AI, this Article suggests a simplified and streamlined registration process that acknowledges the role of AI in creation. This approach not only aligns with the constitutional goal of promoting the progress of science and useful arts but also encourages public engagement in the creative process, which contributes to the pool of training data for AI. Moreover, it advocates for a flexible framework that evolves alongside technological advancements while ensuring safety and public interest. In conclusion, by examining text-to-image generators and addressing misconceptions about Generative AI and user interaction, this Article calls for a regulatory framework that adapts to technological developments and safeguards public interests


Using Text-to-Image Generation for Architectural Design Ideation

Paananen, Ville, Oppenlaender, Jonas, Visuri, Aku

arXiv.org Artificial Intelligence

The recent progress of text-to-image generation has been recognized in architectural design. Our study is the first to investigate the potential of text-to-image generators in supporting creativity during the early stages of the architectural design process. We conducted a laboratory study with 17 architecture students, who developed a concept for a culture center using three popular text-to-image generators: Midjourney, Stable Diffusion, and DALL-E. Through standardized questionnaires and group interviews, we found that image generation could be a meaningful part of the design process when design constraints are carefully considered. Generative tools support serendipitous discovery of ideas and an imaginative mindset, enriching the design process. We identified several challenges of image generators and provided considerations for software development and educators to support creativity and emphasize designers' imaginative mindset. By understanding the limitations and potential of text-to-image generators, architects and designers can leverage this technology in their design process and education, facilitating innovation and effective communication of concepts.


Viral Donald Trump Arrest + Escape Photos Explained!

#artificialintelligence

The viral Donald Trump arrest and prison escape photos are hilarious, and were based on a famous movie! Take a look for yourself, and learn about the story behind these viral photos! BREAKING: Donald Trump was just arrested by New York law enforcement. Recommended: Did Simon Cowell Just Die In A Car Accident?! The truth is these viral Donald Trump arrest photos are fake, and were generated by artificial intelligence, and is really based on a famous Hollywood movie! First, let me just confirmed that all those viral photos of police officers trying to arrest Donald Trump are fake, and were generated by artificial intelligence.


Google may integrate AI text-to-image generator to Gboard for Android

#artificialintelligence

Google is expected to introduce a host of AI features for its products in the near future, and among them, Gboard for Android is working to integrate the Imagen text-to-image generator, the media reported. In a recent APK (Android Package Kit) teardown, conducted by 9to5Google, the latest beta version of Gboard -- contains lines of code that mention an "Imagen Keyboard". This Imagen feature will appear in the shortcuts strip/page, like Clipboard, Translate, and One-handed. For people who are unfamiliar with Imagen, it is similar to the popular text-to-image generator DALL-E 2 -- which is owned by ChatGPT creator OpenAI. It is capable of creating images based on the request users submit to it, according to the report. However, Google's research found that more people preferred Imagen's results over DALL-E's.


What DALL-E reveals about human creativity

#artificialintelligence

The often delightful and arresting images created by the latest generation of text-to-image generators, exemplified by DALL-E 2, Midjourney, and Stable Diffusion, have stirred up lots of buzz in both the arts and the AI worlds. The images, generated from simple text prompts (e.g., a baboon sailing a colorful dinghy), look very much like the products of intelligent human creativity. To explore just how creative these models really are and what they can teach us about the nature of our own innovative propensities, we asked four authorities on artificial intelligence, the brain, and creativity (and we also asked GPT-3, a language-generating model that's a close cousin to DALL-E) to explain what they think of DALL-E's capabilities and artistic potential. DALL-E starts by taking billions of bits of text from the internet and translating them into an abstraction, which it stores in a location in "latent," or logical, space. In the universe of describable things, for example, "baboon" will be "located" by strong associations near to other primates, probably not far from "Africa," "savanna," or "zoo."


What Excites Yoshua Bengio about the Future of Generative AI

#artificialintelligence

Most of the world that stumbled upon AI-generated images this past year may make the mistake of believing that the buzzword'generative' was never-heard-before. But anyone who knows a little more about AI, would be familiar with the fact that the origins of generative AI was with the advent of GANs. In 2014, a group of researchers, including former Google Brain research scientist Ian Goodfellow, his professor and Turing awardee Yoshua Bengio and others released a paper on Generative Adversarial Networks or GANs. They decided to use neural networks in an imaginative manner – they would pit two networks against each other that would constantly try to outwit the other. Both would be trained on the same data set of images and eventually generate a new fake image that would be sufficiently convincing.


Why this ChatGPT moment harks back to the original iPhone - Jack Of All Techs

#artificialintelligence

Check out all the on-demand sessions from the Intelligent Security Summit here. Exactly three weeks ago, OpenAI released ChatGPT. Since then, it has been nearly impossible to keep up with both the hyped-up excitement and brow-furrowing concerns around use cases for the text-generating chatbot, ranging from the fun (writing limericks and rap lyrics) and the clever (writing prompts for text-to-image generators like DALL-E and Stable Diffusion) to the dangerous (threat actors using it for generating phishing emails) and the game-changing (could Google's entire search model [subscription required] be upended?). Is it possible to compare this moment in the evolution of generative AI to any other technology development? According to Forrester Research AI/ML analyst Rowan Curran, it is.